Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PPL command expression implementation for geoip #3228

Open
wants to merge 67 commits into
base: main
Choose a base branch
from

Conversation

andy-k-improving
Copy link
Contributor

@andy-k-improving andy-k-improving commented Jan 1, 2025

Description

Introduce a new PPL command expression geoip, to perform geo-spatial information lookup with the provided IPv4 || IPv6 addresses, result of the lookup is formatted into a tuple with attribute as key and location detail as value.

In this particular setting, SQL plugin will act as a thin client, by relaying the IPEnrichment request to OpenSearch Geo-Spatial plugin, WITHIN the same cluster.
Detail implementation and interface that exposed on Geo-Spatial side can be found:
opensearch-project/geospatial#700

Internally this functionality is achieved by:

  • Adding an no-op OpenSearchFunctionExpression marker to identify this is an expression has no default implement on other runtime (Ex: Prometheus)
  • Update OpenSearchIndex in order to provide an OpenSearch specific handler for eval operator and its expressions, when OS being used as the storage engine.

During runtime, all eval expressions, will being passed to OpenSearchIndex.visitEval( ), then OpenSearchEvalOperator class will pick up the call, by evaluating all eval expression as it is, and then handle all occasion of OpenSearchFunctionExpression separately, by reading the function name and argument, and execute the appropriate business logic.

Marker class OpenSearchFunctionExpression is being used in this case because the actual implementation require runtime OpenSearch client connectivity, however core module is mean to be generic, hence this workaround is being deployed, by tagging it as OpenSearchFunctionExpression on core and only handle it on the opensearch Cradle module .

Related Issues

Resolves: #3037

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@andy-k-improving
Copy link
Contributor Author

As per the offline discussion, I have separated out the integration related changes into #3244, in order to minimise the diff.

.github/workflows/integ-tests-with-geo.yml Outdated Show resolved Hide resolved
.github/workflows/integ-tests-with-geo.yml Outdated Show resolved Hide resolved
.github/workflows/integ-tests-with-geo.yml Outdated Show resolved Hide resolved
integ-test/build.gradle Outdated Show resolved Hide resolved
integ-test/build.gradle Outdated Show resolved Hide resolved
ppl/src/main/antlr/OpenSearchPPLParser.g4 Outdated Show resolved Hide resolved
docs/user/ppl/functions/geoip.rst Outdated Show resolved Hide resolved
:local:
:depth: 1

GEOIP
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought GEOIP would go under ip.rst? Could we combine these files or is there a reason we need to split "Geo IP Address Functions" from "IP Address Functions".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really, it makes more sense to merge both .rst.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually my initial idea was that, since it's not possible to create the test dataSource (Not dataSet) in programmatically manner during the docTest run (Unlike integration-test I can setup the dataSource on Junit).
As the result, I will need to exclude this geoip.rst from the DocTest, and if this got merged into ip.rst, the original DoctTest validation from ip.rst will also be skipped.

docs/user/ppl/functions/geoip.rst Outdated Show resolved Hide resolved
docs/user/ppl/functions/geoip.rst Outdated Show resolved Hide resolved
Expression valueExpr = pair.getValue();
ExprValue value;
if (valueExpr instanceof OpenSearchFunctionExpression openSearchFuncExpression) {
if ("geoip".equals(openSearchFuncExpression.getFunctionName().getFunctionName())) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we not fetchIpEnrichment within the geoip function itself?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that is the feasible, as per the current design geoip function located in the core Gradle sub-module which is not mean to be storage engine specific implementation.
Hence for the alternative, I created OpenSearchEvalProcessor to serve a central registry for all eval OpenSearch specific operation.
WDYT?

Signed-off-by: Andy Kwok <[email protected]>
andy-k-improving and others added 29 commits January 24, 2025 12:45
Signed-off-by: Andy Kwok <[email protected]>
Signed-off-by: Andy Kwok <[email protected]>
Signed-off-by: Andy Kwok <[email protected]>
Signed-off-by: Andy Kwok <[email protected]>
Signed-off-by: Andy Kwok <[email protected]>
Signed-off-by: Andy Kwok <[email protected]>
Signed-off-by: Andy Kwok <[email protected]>
Signed-off-by: Andy Kwok <[email protected]>
Signed-off-by: Andy Kwok <[email protected]>
Signed-off-by: Andy Kwok <[email protected]>
Signed-off-by: Andy Kwok <[email protected]>
Signed-off-by: Andy Kwok <[email protected]>
Signed-off-by: Andy Kwok <[email protected]>
Signed-off-by: Andy Kwok <[email protected]>
Signed-off-by: Andy Kwok <[email protected]>
Signed-off-by: Andy Kwok <[email protected]>
Signed-off-by: Andy Kwok <[email protected]>
Signed-off-by: Andy Kwok <[email protected]>
Signed-off-by: Andy Kwok <[email protected]>
Signed-off-by: Andy Kwok <[email protected]>
Co-authored-by: Andrew Carbonetto <[email protected]>
Signed-off-by: Andy Kwok <[email protected]>
Signed-off-by: Andy Kwok <[email protected]>
Signed-off-by: Andy Kwok <[email protected]>
Signed-off-by: Andy Kwok <[email protected]>
Signed-off-by: Andy Kwok <[email protected]>
Signed-off-by: Andy Kwok <[email protected]>
Signed-off-by: Andy Kwok <[email protected]>
Signed-off-by: Andy Kwok <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEATURE]Add iplocation function to PPL for IP address geolocation
2 participants